AITopics | memory budget

Collaborating Authors

memory budget

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InfiniPot-V: Memory-Constrained KVCache Compression for Streaming Video Understanding

Neural Information Processing SystemsJun-22-2026, 18:40:24 GMT

Modern multimodal large language models (MLLMs) can reason over hour-long video, yet their key-value (KV) cache grows linearly with time--quickly exceeding the fixed memory of phones, AR glasses, and edge robots. Prior compression schemes either assume the whole video and user query are available offline or must first build the full cache, so memory still scales with stream length. InfiniPot-V is the first training-free, query-agnostic framework that enforces a hard, lengthindependent memory cap for streaming video understanding. During video encoding it monitors the cache and, once a user-set threshold is reached, runs a lightweight compression pass that (i) removes temporally redundant tokens via Temporal-axis Redundancy (TaR) metric and (ii) keeps semantically significant tokens via Value-Norm (VaN) ranking. Across four open-source MLLMs and four long-video and streaming-video benchmarks, InfiniPot-V cuts peak GPU memory by up to 94%, sustains real-time generation, and matches or surpasses full-cache accuracy--even in multi-turn dialogues. By dissolving the KV cache bottleneck without retraining or query knowledge, InfiniPot-V closes the gap for on-device streaming video assistants.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

1cbcaa5abbb6b70f378a3a03d0c26386-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 23:49:07 GMT

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.28)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Memory-Efficient Backpropagation Through Time

Audrunas Gruslys, Remi Munos, Ivo Danihelka, Marc Lanctot, Alex Graves

Neural Information Processing SystemsMar-23-2026, 20:52:52 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Memory-Efficient Backpropagation Through Time

Neural Information Processing SystemsMar-17-2026, 10:03:44 GMT

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95\% of memory usage while using only one third more time per iteration than the standard BPTT.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Coop: Memory is not a Commodity

Neural Information Processing SystemsFeb-16-2026, 03:15:53 GMT

To address this issue, we propose to evict tensors within a sliding window to ensure all evictions are contiguous and are immediately used.

machine learning, natural language, tensor, (15 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Asia > China (0.04)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.96)
(2 more...)

Add feedback

Mixtape: Breaking the Softmax Bottleneck Efficiently

Zhilin Yang, Thang Luong, Russ R. Salakhutdinov, Quoc V. Le

Neural Information Processing SystemsFeb-12-2026, 04:17:07 GMT

Neural Information Processing Systems http://nips.cc/

arxiv preprint arxiv, batch size, mixtape, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

1cbcaa5abbb6b70f378a3a03d0c26386-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 17:44:45 GMT

cifar-100, memory budget, section 5, (14 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Saarland (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

1cbcaa5abbb6b70f378a3a03d0c26386-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 17:44:41 GMT

cil task, learning, policy function, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Saarland (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
Asia > Singapore (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment

Kwon, Sangwoo, Seo, Seong Hoon, Lee, Jae W., Park, Yeonhong

arXiv.org Artificial IntelligenceDec-9-2025

How can we effectively handle queries for on-device large language models (LLMs) with varying runtime constraints, such as latency and accuracy? Multi-scale quantization addresses this challenge by enabling memory-efficient runtime model adaptation of LLMs through the overlaying of multiple model variants quantized to different bitwidths. Meanwhile, an important question still remains open-ended: how can models be properly configured to match a target precision or latency? While mixed-precision offers a promising solution, we take this further by leveraging the key observation that the sensitivity of each layer dynamically changes across decoding steps. Building on this insight, we introduce DP-LLM, a novel mechanism that dynamically assigns precision to each layer based on input values. Experimental results across multiple models and benchmarks demonstrate that DP-LLM achieves a superior performance-latency trade-off, outperforming prior approaches.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.06041

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Filters

Collaborating Authors

memory budget

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

InfiniPot-V: Memory-Constrained KVCache Compression for Streaming Video Understanding

1cbcaa5abbb6b70f378a3a03d0c26386-Paper.pdf

Memory-Efficient Backpropagation Through Time

Memory-Efficient Backpropagation Through Time

Coop: Memory is not a Commodity

9be39b35906526b8d240056daac72c6f-Paper-Conference.pdf

Mixtape: Breaking the Softmax Bottleneck Efficiently

1cbcaa5abbb6b70f378a3a03d0c26386-Supplemental.pdf

1cbcaa5abbb6b70f378a3a03d0c26386-Paper.pdf

DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment